Metagenomic analysis of the honey bee queen microbiome reveals low bacterial diversity and Caudoviricetes phages

ABSTRACT In eusocial insects, the health of the queens—the colony founders and sole reproductive females—is a primary determinant for colony success. Queen failure in the honey bee Apis mellifera, for example, is a major concern of beekeepers who annually suffer colony losses, necessitating a greater knowledge of queen health. Several studies on the microbiome of honey bees have characterized its diversity and shown its importance for the health of worker bees, the female non-reproductive caste. However, the microbiome of workers differs from that of queens, which, in comparison, is still poorly studied. Thus, direct investigations of the queen microbiome are required to understand colony-level microbiome assembly, functional roles, and evolution. Here, we used metagenomics to comprehensively characterize the honey bee queen microbiome. Comparing samples from different geographic locations and breeder sources, we show that the microbiome of queens is mostly shaped by the environment experienced since early life and is predicted to play roles in the breakdown of the diet and protection from pathogens and xenobiotics. We also reveal that the microbiome of queens comprises only four candidate core bacterial species, Apilactobacillus kunkeei, Lactobacillus apis, Bombella apis, and Commensalibacter sp. Interestingly, in addition to bacteria, we show that bacteriophages infect the queen microbiome, for which Lactobacillaceae are predicted to be the main reservoirs. Together, our results provide the basis to understand the honey bee colony microbiome assemblage, can guide improvements in queen-rearing processes, and highlight the importance of considering bacteriophages for queen microbiome health and microbiome homeostasis in eusocial insects. IMPORTANCE The queen caste plays a central role in colony success in eusocial insects, as queens lay eggs and regulate colony behavior and development. Queen failure can cause colonies to collapse, which is one of the major concerns of beekeepers. Thus, understanding the biology behind the queen’s health is a pressing issue. Previous studies have shown that the bee microbiome plays an important role in worker bee health, but little is known about the queen microbiome and its function in vivo. Here, we characterized the queen microbiome, identifying for the first time the present species and their putative functions. We show that the queen microbiome has predicted nutritional and protective roles in queen association and comprises only four consistently present bacterial species. Additionally, we bring to attention the spread of phages in the queen microbiome, which increased in abundance in failing queens and may impact the fate of the colony.

of generations, and reproductive division of labor.Exemplars of insect societies are ants, wasps, and bees from the order Hymenoptera, the largest and most wellknown animal group with eusocial species.In eusocial insects, the reproductive female caste (queens) is responsible for laying eggs and regulating the colony's behavior and development.After mating, queens spend almost their entire life, which can reach several decades, inside the colony being fed by specialized workers.They are among the most fecund terrestrial animals; queens from some insect species can lay ~20,000 eggs per day (1), which may explain the success of some eusocial insect genera.The importance of queens also extends beyond their reproductive role.Queens maintain the colony homeostasis by managing the behavior of other colony members, such as attracting workers or inducing submissive behavior, modulating aggression, or inhibiting the production of new queens through pheromone signals (2)(3)(4)(5)(6).
The importance of queens for colony success becomes even more evident with the example of honey bees (Apis mellifera).In addition to the significance of honey bees for general ecosystem services in their natural range, managed colonies contribute billions to the agricultural economy annually in the United States, due to their pollina tion services (7).However, annual losses of honey bee-managed colonies have reduced reliability on them and affected food security.Queen failure and premature supersedure are consistently reported as the leading contributing factors for colony mortality (8).Honey bee queens can live 3-4 years, but recently, their diminished longevity requires the replacement of the queens almost every year, a practice also used preventively (9).The causes behind failing queens are still poorly understood, but they may include problems with development, insemination success, infection by parasites, exposure to xenobiotics, and adverse temperatures (10,11).Interestingly, the microbiome of insects, including honey bee foragers, is also known to modulate the host response to some of these stressors.For example, the worker gut microbiome can protect the host from parasites (12), and parasites can also shape the gut microbiome due to the infection (13); or the microbiome can buffer the effect of pesticides (14), and pesticides can shape the microbiomes (15), ultimately impacting colony fitness.Thus, the microbiome is an important trait that should be included in studies on the health of eusocial insect queens, and Apis mellifera is an excellent model system for these investigations.
Most of what is known about the microbiome of honey bees and its role in colony health comes from investigations on worker bees-the non-reproductive females that take care of the hive and when older leave the colony to collect pollen and nectar.Honey bee workers have a taxonomically simple and consistent gut microbiome among colonies around the world, comprising the core members Bifidobacterium, Snodgrassella, Gilliamella, and two groups of Lactobacillus, Firm-4 and Firm-5.This microbial commun ity impacts host health in multiple ways (see reference 16), and its acquisition and transmission occur mostly through interactions with individuals from the colony and the hive environment (17).Dysbiosis in this system is strongly correlated with poorer worker fitness and is characterized by shifts in the load of core microbes and the spread of opportunistic bacteria (18).More recently, fungi and bacteriophages' community characterization has also been included in studies of the bee microbiome (19)(20)(21)(22)(23).The role that phages play in molding the microbiome and impacting host health has been shown in unrelated model systems (24)(25)(26)(27), but in honey bees, their diversity and potential effects are understudied.
Curiously, there is evidence that honey bee queen microbes are distinct from those of workers from their own colony (28).Thus far, based on 16S rRNA amplicon-based studies, the honey bee queen microbiome appears to comprise mostly bacteria from the genera Lactobacillus, Bombella, and Commensalibacter.However, the queen gut microbiome is not as consistent as observed for worker bees, and the relative abundance of the associated bacteria is quite variable among queens from different colonies (28)(29)(30).In addition to the natural microbiome variability among queens, controlled experiments have shown that both age and early-bacteria colonizers coming from social interactions or rearing protocols also lead to differences in their microbiomes (29).A metagenomic approach, enabling species-level characterization and access to genomic information, could improve our understanding of the queen microbiome assemblage and its role in queen health.
Here, we sequenced the metagenomes of 18 queen gut samples from the Uni ted States and Canada to characterize their microbiome.These queen samples have associated metadata from previous studies (31,32), including both failing and healthy queens.We describe the queen gut microbial community, from bacteria to phages, and investigate the most important factors shaping it.In addition, we characterize the microbiome at a functional level and recover metagenome-assembled genomes to identify, at the species level, the candidate core microbiome.

The honey bee queen microbiome is dominated by bacteria of the Acetobac teraceae and Lactobacillaceae families
We sequenced a total of 18 queen gut metagenomes, ranging from 7 to 33 million paired reads per sample after quality trimming (Table S1).First, we asked which microbes were present in the community associated with the honey bee queen gut.We mapped trimmed reads against 18S rRNA and 16S rRNA databases for taxonomic identification of fungi and bacteria, respectively.As expected, based on the paucity of fungal reads recovered from other studies, no reads mapped to 18S rRNA; on the other hand, reads mapped to bacteria from at least five families, mostly Acetobacteraceae and Lactobacil laceae (Fig. 1A).At the genus level, the queen microbiomes varied extensively in the abundance of Acetobacteraceae (Bombella or Commensalibacter) and Lactobacillaceae (Apilactobacillus or Lactobacillus) (Fig. 1A).Other bacteria, known to often be a part of the worker bee gut microbiome, were also present in some of the queen samples, such as Bombilactobacillus, Bifidobacterium, and Frischella.The geographical location from which queens were sampled (PERMANOVA; City, bacterial family, P = 0.046, R 2 = 0.304, F = 2.165; bacterial genus P = 0.045, R 2 = 0.252, F = 1.667;State, bacterial family, P = 0.047, R 2 = 0.115, F = 2.270; bacterial genus, P = 0.039, R 2 = 0.121, F = 2.286) and the year of sampling (which is conflated with the geographic location, see Table S1) were two of the factors explaining the differences between the queen microbiomes both at the family and genus level.Additionally, the queen source-that is, the breeder who provided the queen-was also a factor explaining microbiome differences across samples (PERMA NOVA; bacterial family, P = 0.040, R 2 = 0.421, F = 2.488, bacterial genus, P = 0.039, R 2 = 0.357, F = 1.850).Queen source, although influenced by the genetics of the stocks, could also be influenced by environmental differences since the queen-rearing environment and protocol used by the beekeeper could impact queen microbiomes.Importantly, in our study, the location of these samples (State and City) is also confounded by the queen source, as beekeepers from the same area generally had the same queen suppliers (Table S1).
Due to the effect of the environment on the queen microbiome composition, differences in the microbiome of healthy or failing queens were only observed depend ing on their geographical location (PERMANOVA; interaction between State and Status, P = 0.047, R 2 = 0.121, F = 2.395).This interaction effect was observed at the family level; the queen core microbiome families, Lactobacillaceae and Acetobacteraceae, were among the taxa that explain most of these differences among the samples and their clustering in the principal coordinate analysis, PCoA (Fig. 1B).But also, Erwiniaceae, a non-core bacterial family of the honey bee microbiome, was more abundant in failing queens and contributed to the PCoA topology separating sample A0012 from the other U.S. samples (Fig. 1B).Interestingly, with the normalization of the bacteria coverage by the library read depth, we can also observe that failing queens, mostly from British Columbia (Canada), have higher proportions of bacterial reads per gut sample (Fig. 1A).This result, however, could also be affected by differences in diet consumption (environmental DNA) on the day of sampling, and future studies could use quantitative PCR to obtain absolute bacterial abundances.

Queen genetic background is not directly correlated with microbiome composition or health status
Because the queen source-that is, the breeder who provided the queen-was one of the factors explaining microbiome composition, we decided to further investigate if the queen's genetic background could be a direct predictor of its gut community.Honey bee queen population structure was assessed from genotype likelihoods of 7,342,540 polymorphic sites.Both population structure plots showed a clear genetic differentiation among the honey bee queens from Pennsylvania (USA) and British Columbia (Canada); although the admixture plot suggests some gene flow between populations (Fig. 2A).We also tested higher numbers of ancestral populations and there is no clear popula tion structure for queens from the same apiary or queen company supplier (Fig. S1).The principal component analysis (PCA) also cleanly delineates the samples based on the State factor and interestingly, the samples with mixed ancestry are those at the edges of each cluster, closer to the center of PC1 (Fig. 2B).Finally, we used a Mantel test to compare similarity/dissimilarity of queen genetic background and microbiome composition.The bacterial families or genera were not correlated with pairwise genetic distance among queens (bacterial family, R = 0.04113, P = 0.3906; bacterial genus R =

The candidate core microbiome of queens comprises four bacterial species
We recovered metagenome-assembled genomes (MAGs) of the core honey bee queen microbiome members.The co-assembly of all trimmed reads generated 177,846 contigs > 500 nt (Table S2), which were then grouped into bins.In total, eight bacterial MAGs with partial (<50%) to nearly complete (>90%) genomes were recovered and placed into a phylogenetic context to predict species (Table S3; Fig. 3; Fig. S2).Considering that MAGs will be retrieved from the abundant species in metagenomes (present in more samples, with more reads), among them we should be able to find the core members of the queen microbiome.Core microbiome members should be present in all healthy queens and some of the failing queens, while non-core members would be present only in a few queen samples.Single-copy core genes were used to estimate the coverage of MAGs, revealing that Bombella apis, Commensalibacter sp., Apilactobacillus kunkeei, and Lactobacillus apis were present in all queen samples (with the exception of L. apis, which was absent in one failing queen; Fig. 3; Table S4).In 11 out of 18 samples, these four bacteria species represent more than 50% of the microbiome (Table S4).Only the sample A0012, a failing queen, had an important reduction of the core microbiome (9%) and an increase in the abundance of Enterobacteriaceae.This opportunistic bacterium is part of the non-core MAGs, which includes the Lactobacillaceae Lactobacillus panisapium, the Bifidobacteriaceae Bifidobacterium apousia, the Orbaceae Frischella perrara, and the Enterobacteriaceae Tenebrionicola-like (Fig. S2).
We also used these MAGs to ask if a shift in the microbial community, i.e., dysbiosis, was observed in the abundance of the core species from the microbiome of failing queens.Although we can observe a tendency for a shift in species proportions when comparing queens with different health statuses, variance of relative coverage within the group was high, and there was no significant difference and no shift pattern in their proportions in failing queens across both locations (Fig. 3).At the family level, however, Lactobacillus in the state of Pennsylvania had a significant decrease in their proportion in failing queens (Wilcoxon rank test, P = 0.028).

Functional properties of the queen gut microbiome
To characterize the potential role of the microbiome in the queen host and interactions with and within the microbial community, we predicted protein functions from all bacterial contigs-thus avoiding biases due to MAG completion status or non-binned partial bacteria genomes, which may also have proteins with functional relevance.From all bacterial contigs, 23,833 proteins were predicted and 12,910 (54%) were assigned to Kyoto Encyclopedia of Genes and Genomes (KEGG) ortholog categories.The presence and absence of proteins in each sample were used to compare them in a clustered heatmap, where all functional categories in the queen microbiome are shown (Fig. 4).Overall, there was no broad functional category missing or overrepresented in samples, so no clustering by microbiome functional profile was observed for samples of the same geographic location (Pennsylvania vs British Columbia) or health status.Cluster 2 (C2, Fig. 4), however, includes the samples with more abundant non-core bacteria.In addition to proteins with general cell function, the main categories present were the metabolism of cofactors and vitamins, energy metabolism, carbohydrate metabolism, and amino acid metabolism.Protein counts from the energy metabolism category confirm that the microbes associated with honey bee queens mostly use oxidative phosphorylation for energy production (Fig. S3E).These microbes also harbor genes involved in the degradation of xenobiotics (e.g., cytochrome P450), the biosynthesis of antibiotics (e.g., monobactam and streptomycin), and biofilm formation pathways, which may play an important role in community regulation (Fig. S3M, B, and D, respectively).While genes for lysine biosynthesis were annotated in all samples, only sample A0012, a failing queen with depleted core-microbiome and increased abundance of the Enterobacteriaceae Tenebrionicola-like, has an increase in proteins involved in the degradation of this amino acid, known to be able to buffer poor bee nutrition (Fig. S3A) (33).

Bacteriophages of the class Caudoviricetes compose the queen microbiome and are found in higher abundance in failing queens
In addition to the search for fungi and bacteria in the microbiome of queens, we also wondered if bacteriophages, important modulators of microbial communities, could also compose the microbiome of the honey bee queens.A total of 74 phage viral operational taxonomic units (vOTUs) were retrieved, including three temperate and 71 putative virulent phages (Table S5).All the phage sequences with predicted genes were classified taxonomically as Caudoviricetes based on the similarity with viral marker genes in the database.Among the temperate phages, one is integrated into the genome of Lactoba cillus panisapium, and the other two are integrated into the genome of the Enterobacter iaceae, Tenebrionicola-like.We tested whether viruses integrated into their host genomes were being induced by comparing the coverage of temperate phages and the coverage of the respective host (Table S6).The coverage of these phages, however, was not greater than the coverage of the genome of their hosts, indicating that they are not being induced.
To predict putative hosts for the virulent phage vOTUs, we first conducted nucleotide similarity searches against CRISPR-cas arrays predicted in our MAGs.Genomes of B. apis, L. apis, L. panisapium, and Tenebrionicola-like have CRISPR-cas arrays with one to seven spacers per array (List S1), but none of the virulent phage sequences from the metagenomes matched these spacers.However, we still were able to predict putative hosts of 21 virulent phages based on the other approaches used, including matches with other CRISPR spacers' databases, BLASTs, and k-mer compositional analyses (Fig. 5; Table S5).The virulent phages were predicted to infect mostly Lactobacillaceae and Enterobacteriaceae.
Differences in the abundance of phage sequences among the samples were signifi cant for all factors being tested, including geographic location (Kruskal-Wallis rank test; P = 0.005, df = 3), age (Wilcoxon rank test; P = 8.41e−02), or queen source (Kruskal-Wallis rank test; P = 0.005, df = 4).Interestingly, the queen's health status was one of the most significant factors in terms of differences in phage abundance (Wilcoxon rank test; P = FIG 4 The microbiome functional profile suggests an important role for honey bee queen health and homeostasis.The heatmap shows scaled protein counts per KEGG category (rows), which include proteins involved in xenobiotics biodegradation and metabolism, amino acid metabolism biosynthesis (e.g., lysine), or biosynthesis of secondary metabolites (e.g., antibiotics).Samples (columns) were hierarchically clustered.Three main functional groups were formed (C1, C2, and C3), but they do not correlate with any of the variables tested in our investigations, such as health status or geographic location.Cluster 2, however, has an increase (dark red) in the representation of all KEGG categories due to the relative abundance of non-core bacteria in their microbiomes.0.029; British Columbia, P = 5.12e−05, Pennsylvania, P = 0.032), even when excluding temperate phages (Wilcoxon rank test; P = 0.04).
Additionally, we estimated the ratio between bacteria and phages in the samples, showing that although queens tend to have more bacteria than phages, the four samples with more phages than bacteria were failed queens (Table S7).Also, since phages need a host, we could expect a consistently increased abundance of phages in the samples with an increased abundance of their hosts.Following this rationale, we used a correlation analysis to potentially find putative hosts for phages with no putative host identified with the previous methods.As expected, all significant correlations among phages and bacteria were positive correlations, and most bacteria co-occurred precisely with phages predicted to infect their family, such as Lactobacilla ceae and Enterobacteriaceae.The result of this analysis indicates some phages that may be candidates to infect three members of the core microbiome, B. apis (e.g., k141_169038), A. kunkeei (e.g., k141_368347), and Commensalibacter (e.g., k141_456794; Fig. S4), although these bacteria also had significant, but weaker correlations with Lactobacillaceae-putative phages.

DISCUSSION
Investigations on the microbiome of eusocial insect queens are not as frequent as those on worker castes, despite the key role of the queen for superorganism fitness.However, studies conducted so far have revealed that the microbiome of queens-including species of termites, ants, and bees-usually differs from that of their workers (28,34,35).The lack of direct correlation between worker and queen microbiomes rein forces the importance of queen microbiome characterization to understand colonylevel microbiome assembly, functional roles, and evolution.Honey bees, especially, are economically relevant and have been important models for microbiome investiga tions; amplicon-based characterizations of the queen microbiome have shown that it is mostly composed of Lactobacillaceae and Acetobacteraceae (28)(29)(30).Here, using a metagenomic approach, we showed that the microbiome of honey bee queens (i) varies considerably in the abundance of these two bacterial families, with the environment having an important impact on these differences, (ii) the microbiome comprises four core bacterial species and is predicted to have a protective and nutritional role, and that (iii) temperate and virulent Caudoviricetes phages are part of the microbiome, for which core Lactobacillaceae may be important reservoirs and spreaders.
Previous studies on bee workers and queens have shown that, to some extent, their genetic background influences microbiome composition.In workers, these results were obtained by comparing the microbiome of A. mellifera subspecies, isolated from each other and inbred since 1980 (36).The evidence for the effect of queen genetic back ground on microbiome composition was shown by comparing two species of Apis (A. cerana and A. mellifera), which have their last common ancestor dated to approximately 6 mya (37,38).In the current study, we did not observe the same trends comparing queen genotypes of the same A. mellifera subspecies, meaning the genetic differences among the queens were not enough to explain the differences in the microbiome (Fig. 2).This result, however, does not eliminate the possibility of rare alleles playing a role in the queen microbiome assembly, and a larger sample size with queens from more populations could test this hypothesis.Interestingly, we instead found that the queen microbiome composition was shaped by the rearing environment, i.e., geographic location and queen breeder source.For worker bees, it is known that the environment landscape plays an important role in the microbiome composition (39).The effect of breeder source on queen microbiome composition was also observed in another study with honey bee queens (30) and, together with the lack of correlation between queen genetic background and microbiome composition, suggests a role of priority effects (arrival order and/or timing) on queen microbiome assembly.The effect of the environ ment on microbiome composition may also have masked differences in the microbiome of healthy and failing queens (Fig. 1).Dysbiosis can shift the microbiome to a different end community composition depending on previous colonizers and the factors causing the shift (18).Thus, to characterize dysbiosis patterns in failing queens, an experimental design controlling for environmental effects would be important in future investigations.Regardless, with our approach, we were able to observe a typical dysbiotic pattern in some of the failing queens, which was an increase in bacteria abundance and an increase in the proportion of non-core microbiome members.However, the changes observed in the microbiomes of the failing queens in our study could be a consequence of weak queens and not the direct cause of queen failure.
Our findings reveal that the microbiome of queens is very constrained, composed of only four candidate core bacterial species: Bombella apis, Commensalibacter sp., Apilactobacillus kunkeei, and Lactobacillus apis (Fig. 3).This represents even fewer bacterial members than the known simple core microbiome of worker bees, comprising five phylotypes with multiple species (40,41).Additionally, no fungi were detected in our queen samples, which is in contrast to previously published amplicon sequencing-based investigations with queens and worker bees (19,20,42,43).Importantly, this difference could be due to the fact that amplicon sequencing will detect extremely low abundance templates resulting from the honey bee diet.
Queens are the longest-lived members in a colony, they are larger than workers, and they have well-developed ovaries, thus their physiology itself already represents a different niche for bacteria to colonize.Additionally, the highly specialized diet of queens throughout their lives, comprising mainly royal jelly, may have facilitated the selection of a small number of core members.Queens are genetically identical to their sisters that became workers, i.e., female larvae are bipotent and can equally develop into queens or workers depending on their larval diet (royal jelly vs worker brood food).Royal jelly is composed of water, proteins, sugars, and lipids, but it is also very viscous, acidic, and imbibed with antimicrobial peptides, such as royalisin, jelleines, apismin, royalactin, and fatty acids, which together confer the antimicrobial role of the royal jelly (44,45).For example, royal jelly can inhibit the growth of bee pathogenic bacteria, such as Melissococcus plutonius and Paenibacillus alvei (46).Interestingly, royal jelly does not inhibit all bacteria; a recent study showed that Bombella apis, one of the core members of the queen microbiome, can withstand and even replicate in royal jelly (33).In this same study, however, Apilactobacillus kunkeei, did not show the same ability.In other host model systems where the microbiome is also composed of Acetobacteraceae and Lactobacillaceae, such as in Drosophila, the colonization of these bacterial families is enabled due to cross-feeding; Acetobacter pomorum uses the lactate produced by Lactobacillus plantarum to supply amino acids that are essential to L. plantarum (47).All the members of the queen microbiome can be isolated and cultivated in the laboratory, and future experiments could test if cross-feeding is behind the presence of these core bacteria in such an antimicrobial environment.It is also important to note that the royal jelly is produced, secreted, and offered to the queen by nurse bees, meaning the selection for these royal-jelly-resistant microbes is likely already occurring in the nurse bee hypopharyngeal glands.Moreover, it is conceivable that queens have different needs with regard to nutrition compared to workers, and thus they may require different nutritional symbionts.In fact, our functional analysis has shown that the microbiome of queens is equipped with genes involved in protein and carbohydrate metabolism (e.g., peptidases, lipid biosynthesis, fructose, and mannose metabolism; see Fig. 4; Fig. S3), likely supplementing primary components of the host's diet.
With our metagenomic approach, we were able to show for the first time that Caudoviricetes, a class of tailed bacteriophages, are part of the queen gut microbial community (Table S5).Previous studies have shown that Caudoviricetes are the main phages in the microbiome of honey bee workers (21)(22)(23), but the lack of similarity in the gut bacterial community of these two female castes has left open the question about the phages present in the queen microbiome.In the worker's gut, the main phage hosts are the core bacteria Bifidobacterium, Gilliamella, and Lactobacillus, and the non-core Bartonella (21)(22)(23).Among these hosts, Lactobacillus is the only genus that is also found in the queen microbiome and, interestingly, we show here it is the queen gut bacterium hosting the majority of phages (Fig. 5).Importantly, however, for many phages, the host prediction is difficult to impossible (21)(22)(23); here, 77% of the phage sequences did not have a host predicted.However, we indeed predicted CRISPR-Cas systems with spacers in the Bombella apis MAG, suggesting a history of infections by phages (List S1).Also, phage sequence distribution was consistent with the change in the microbiome due to environmental effects, and phages were also more abundant in the failing queens (Fig. 5; Table S7).This result points out that dysbiosis, usually detected via shifts in bacterial composition, can also be characterized by phage production and spread in the microbiome.We observed that some of the most abundant phage sequences in the queen microbiome are associated with a non-core bacterium, an Enterobacteriaceae, putting the native queen microbiome at risk.One of these most abundant phages is a temperate phage, based on the analysis of coverage compared to the host genome.If phages are induced or spread in the microbiome, they can not only directly kill their hosts but may also trigger immunological responses or even foster gene transfer between cells (48).The inherent complexity of a eusocial insect colony poses a challenge in investigations on the health of queens and, ultimately, of the colony (49).For future studies, a longitudinal approach could improve the resolution of the ecological interaction feedback between phages and bacteria over time and its impact on queen health or failure.

Honey bee queen samples
Honey bee queens were sampled in 2018 from colonies located in Pennsylvania, USA, and in 2019 from colonies in British Columbia, Canada.The samples from both locations are part of previously published data sets (31,32), in which the queens were classified according to their health status.Queens were scored as healthy when they showed no sign of supersedure cells, no drone brood in worker cells, no signs of disease, and had strong worker populations.Failing queens exhibited one or more of the following: drone brood in worker cells, spotty brood pattern, weak colony population, and supersedure cells.In addition to the health status, these samples have associated metadata that were used for the analysis, including queen age, source, ovary mass and size, and manage ment strategy (Table S1).All samples were stored at −70°C until dissection and further use.

Sample processing and shotgun sequencing
All queens were dissected on ice with 70% ethanol sterilized tools by gently pulling on the last abdominal tergite.DNA extraction was performed by a microbiome analytics company, Microbiome Insights, using the Qiagen MagAttract PowerSoil DNA KF Kit optimized for the Thermo Scientific KingFisher robot.Quality and quantity of DNA were checked on an Agilent 2200 TapeStation, and a total of 18 samples with a DNA integrity number > 6 were submitted for library construction and metagenome sequencing at the biotechnology company SeqCenter, Pittsburgh, PA, USA.Libraries were prepared using the Illumina DNA Prep kit and IDT 10 bp UDI indices and sequenced on an Illumina NextSeq 2000, producing 2 × 151-bp reads.Demultiplexing, quality control, and adapter trimming were performed with bcl-convert (v3.9.3) from Illumina.Sequencing data can be found within the NCBI BioProject PRJNA1007366.

Bacteria and fungi community characterization
Paired-end raw sequencing reads were first trimmed by length and quality using Trimmomatic v.0.36, with options "LEADING:28 TRAILING:28 SLIDINGWINDOW:6:25 MINLEN:75" (50).Bowtie2 v.2.4.2 was used to map trimmed individual sample reads to bacteria and fungi marker gene databases, using the options "--no-discordant --very-sen sitive" (51).The bacteria sequence database used was BEExact, a comprehensive 16S rRNA database of all bacteria taxa previously found associated with bees (52).The fungal sequence database used was SILVA 138.1 SSU, which contains 18S rRNA sequences (53).Samtools was used to transform mapping files and retrieve, with depth command, option "-a, " the coverage per base of the marker gene sequence (54).Coverage was summed for sequences from the same family or genus and normalized by the sequence length and library read depth (mean sequence coverage/normalization factor; Table S1).

Co-assembly and broad taxonomic assignment
Trimmed reads from all 18 samples were used in a co-assembly with MEGAHIT v.1.1.2(55).Contigs > 500 nt were binned into metagenome-assembled genomes (see the section Metagenome-assembled bacteria genomes) and used for broad taxonomic classification of samples.We first classified bee contigs by mapping a 500 nt random fragment of contigs with Bowtie2 v.2.4.2 against Apis mellifera genome, GCF_003254395.2_Amel_HAv3.1 (51).Contigs that did not map were used as queries in a BLASTx (56) against nr database from the NCBI, being classified according to the best hit taxonomy as bacteria, eukaryote/bee, virus, or unknown (Table S2).Contig coverage used for the following analyses was recovered by mapping trimmed individual sample reads back to all contigs with Bowtie2 v.2.4.2, using the options "--no-discordant --very-sensitive" (51).Samtools was used for file transformation, and the depth command with option "-a" was used to recover the coverage per base of the contigs (54).Mapping results were also used to plot the proportion of reads mapping to host or other taxa (Fig. S5).

Metagenome-assembled genomes
Contigs were grouped into bins with MetaBAT2 v.2.11.3 using default settings (57).CheckM v.1.1.6was used to assess the quality of the resulting MAGs (58).Bacterial MAGs with >30% completeness and <5% contamination were included in the following analyses.To confirm that dereplication of MAGs is not necessary, which is expected for co-assemblies, we ran the "dereplicate" command from dRep v.3.4.2 (59).No MAGs were clustered beyond the threshold of >90% average nucleotide identity from the primary dendrogram of pair-wise Mash distances between all MAGs.Each MAG was subjected to gene/protein prediction with Prokka v.1.14.6, using default options (60).In addition, we used CRISPRCasFinder with default parameters to identify CRISPR-Cas loci in contigs from all MAGs (61).CRISPR repeat spacers from all CRISPR arrays were extracted and added to a spacer database used for putative phage host discovery (see "Characterization of bacteriophages in the microbiome"; List S1).

Phylogenetic placement of MAGs and abundance
MAGs were placed into phylogenetic trees with related genomes for species-level taxonomic characterization.The best BLASTx results of the MAG contigs (see "Coassembly and broad taxonomic assignment") guided the choice of species reference genomes to retrieve from NCBI and include in the phylogenetic analysis, in addition to outgroups.Single-copy orthologs were recovered with OrthoFinder v.2.5.4 (62), aligned with MAFFT v.7.520 (63), and then concatenated for IQTREE v.2.2.0.3 maximum likelihood analysis, options -m TEST -B 1000 (64).Figtree v.1.4.4 was used for tree visualization.To estimate and compare MAG abundance across samples, the mean coverage of single-copy orthologs was retrieved from previous mapping output (see "Co-assembly and broad taxonomic assignment") with an in-house Perl script (https:// github.com/liliancaesarbio/general_scripts/)and normalized to the library read depth (mean sequence coverage/normalization factor; Table S1).The proportion of total reads mapping to each MAG was also recovered and made available in Table S8.

Microbiome functional characterization
All contigs classified taxonomically as bacteria, including MAG sequences, had pro teins predicted with Prokka using default settings (60).The mean coverage of each protein was retrieved from previous mapping (see "Co-assembly and broad taxo nomic assignment") with an in-house Perl script (https://github.com/liliancaesarbio/general_scripts/).Proteins were assigned for functionality at different levels of the KEGG database with EggNOG-mapper 2.1, using default parameters (65).Genes with >0.1× coverage were considered present.First, KEGG level B of description was retried for each KO ID, and categories with total samples sum of >700 coverage and 100 protein counts were considered.Gene counts for both KEGG level B and KEGG level C for each non-general function category were plotted with R v.3.6.3, using package pheatmap (66).To test for functional similarity between the samples, the heatmap was hierarchically clustered with hclust complete linkage method.

Characterization of bacteriophages in the microbiome
Sequences of bacteriophages were identified using three programs; geNomad was run with option "end-to-end" and a cutoff of >0.7 virus score (67), VirSorter2 v2.2.4 was run using options "--min-score 0.7 --hallmark-required-on-short" (68), and VIBRANT v1.2.1 was run using the default settings (69).For this analysis, contigs > 500 nt, classified as bacteria, virus, or unknown, were used as input, enabling the identification of virulent or temperate phages.For contigs of bacteria, only proviruses (temperate phages) were considered, since the other detected phage sequences may represent horizontal gene transfer events.No viral bins were generated or dereplication needed due to the co-assembly approach (see Metagenome-assembled genomes"); also, no bins were recovered with vRhyme v1.1.0,using default settings (70).CheckV "end-to-end" was used to estimate completeness of the final list of viral operational taxonomic units (71), and only phage vOTUs with at least one non-host gene were retained for the following analyses.The mean coverage of vOTUs was recovered from the depth output files previously obtained (see "Co-assembly and broad taxonomic assignment") using an in-house script (https://github.com/liliancaesarbio/general_scripts/).Coverage was estimated based only on the viral region of the contig since they are the result of a co-assembly, and in some samples that virus may not be present-as for the case of temperate phages.Candidate hosts were predicted with iPHoP v. 1.3.1 (72), BLASTn with options "-evalue 1e-3 -ungapped -perc_identity 95" against NCBI nt database, CRISPR spaces annotated from MAGs and also against the CrisprOpenDB (73).The last data base includes more than 9 billion spacers, as well as the ones annotated in previously published bee microbiome studies (21)(22)(23).

Statistical analysis
All graphs and analyses were run on R v.3.6.3 using the packages cited.We assessed beta diversity and clustering profiles for the 16S rRNA-based results with a PCoA ordination plot ran on Bray-Curtis dissimilarity matrices using package Ape (80), then plotted with ggplot2 (78).To test for the effects of geographic location (State and City), year, age, queen origin, management, health status, and fitness markers (ovary size and ovary mass) on the bacterial community composition, we ran PERMANOVA, 9,999 permutations, with the function adonis2 from Vegan package (81).The effect of the same factors on the proportion of core microbiome members and Caudoviricetes was tested with t-test or Wilcoxon rank test and Kruskal-Wallis rank test in the case data were not normally distributed with Shapiro-Wilk normality test.All P-values were adjusted with the false discovery rate method.To test for the co-occurrence of phages and bacteria, we ran a Spearman correlation analysis with package Hmisc (82), plotted with packages corrplot (83) and PerformanceAnalytics (84).To test for the correlation between host genetic background and microbiome composition, we used the Mantel test from package Vegan (81), with Spearman correlation and 9,999 permutations.The input matrices were the queen genotype likelihood covariance matrix or the queen pairwise genetic distance (see "Queen genetic background") and the Bray-Curtis dissimilarity of the gut microbiome composition at the family or genus level.

FIG 1
FIG 1 The microbiome of honey bee queens varies extensively in the abundance of the two most prevalent bacterial families, Acetobacteraceae and Lactobacillaceae.(A) Relative (above) and library read depth normalized (below) coverage of bacteria genera comprising the microbiome of healthy and failing honey bee queens from two different locations, Pennsylvania (USA) and British Columbia (Canada).Color shades represent bacteria families; Acetobacteraceae (yellow), Bifidobacteriaceae (green), Erwiniaceae (gray), Lactobacillaceae (purple), and Orbaceae (blue).(B) Principal coordinate analysis (PCoA) showing the distribution of queen samples based on the dissimilarity of relative proportions of the microbial communities at family (above) and genus level (below).Arrows highlight the bacteria with the strongest contribution to the ordination, and each axis shows the percentage of variation explaining.

FIG 2
FIG 2 Genetic background of the honey bee queens groups them in a population for each general geographical location and not by health status or microbiome composition.(A) Bar plot resulting from the SNP admixture analysis for K2 ancestral populations (y-axis shows the proportion of each ancestral population), clustering queens from Pennsylvania (PA) and British Columbia (BC).(B) PCA of the genotype likelihood covariance matrix shows the same population structure, separated in PC1, which explains 22% of the total variation.

FIG 3
FIG 3 The core microbiome of honey bee queens comprises only four species.(A) Phylogenies inferred with maximum likelihood for the MAGs of the two core Acetobacteraceae.Above, Bombella apis and relatives, inferred from the alignment of 927 single-copy orthologs (SCO).Below, Commensalibacter and relatives, inferred from the alignment of 708 SCO.(B) Phylogenies for the MAGs of the two core Lactobacillaceae.Above, Lactobacillus apis and relatives, inferred from the alignment of 80 SCO.Below, Apilactobacillus kunkeei and relatives, inferred from the alignment of 292 SCO.(A and B) On the side of each phylogeny is the relative coverage of the MAG in queens with different health statuses and geographical locations.Below the phylogenies of each bacterial family is the same plot configuration, but with the coverage sum of both MAGs.Significant statistical difference was only observed when comparing the abundance of core Lactobacillaceae among queens with different health statuses in Pennsylvania; there is a reduction of their abundance in failing queens (Wilcoxon rank test, *P = 0.028).

FIG 5
FIG5 Caudoviricetes phages spread opportunistically in the microbiome of failing queens.The coverage of phage vOTUs (dots, with a coverage >1) is shown for each queen sample, which was higher in failing queens (Wilcoxon rank test, P = 0.0297; PA, P = 0.032; BC, P = 5.12e−5).The color of the dot indicates the host where its genome is integrated or its putative host.